Sequence alignment and mutual information

نویسندگان

  • Orion Penner
  • Peter Grassberger
  • Maya Paczuski
چکیده

Background: Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. All existing alignment algorithms rely on heuristic scoring schemes based on biological expertise. Therefore, these algorithms do not provide model independent and objective measures for how similar two (or more) sequences actually are. Although information theory provides such a similarity measure – the mutual information (MI) – previous attempts to connect sequence alignment and information theory have not produced realistic estimates for the MI from a given alignment. Results: Here we describe a simple and flexible approach to get robust estimates of MI from global alignments. For mammalian mitochondrial DNA, our approach gives pairwise MI estimates for commonly used global alignment algorithms that are strikingly close to estimates obtained by an entirely unrelated approach – concatenating and zipping the sequences. Conclusions: This remarkable consistency may help establish MI as a reliable tool for evaluating the quality of global alignments, judging the relative merits of different alignment algorithms, and estimating the significance of specific alignments. We expect that our approach can be extended to establish further connections between information theory and sequence alignment, including applications to local and multiple alignment procedures.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Microsoft Word - JBIO_Proteins Sequence Alignment

Abstract—Alignment of biological sequences such as DNA, RNA or proteins is one of the most widely used tools in computational bioscience. One of the important research topics of bioinformatics is the multiple proteins sequence alignment. Since the exact methods for MSA have exponential time complexity, the heuristic approaches and the progressive alignment are the most commonly used in multiple...

متن کامل

InterMap3D: predicting and visualizing co-evolving protein residues

SUMMARY InterMap3D predicts co-evolving protein residues and plots them on the 3D protein structure. Starting with a single protein sequence, InterMap3D automatically finds a set of homologous sequences, generates an alignment and fetches the most similar 3D structure from the Protein Data Bank (PDB). It can also accept a user-generated alignment. Based on the alignment, co-evolving residues ar...

متن کامل

Substitution Matrices and Mutual Information Approaches to Modeling Evolution

Substitution matrices are at the heart of Bioinformatics: sequence alignment, database search, phylogenetic inference, protein family classi cation are based on Blosum, Pam, JTT, mtREV24 and other matrices. These matrices provide means of computing models of evolution and assessing the statistical relationships amongst sequences. This paper reports two results; rst we show how Bayesian and grid...

متن کامل

MISTIC: mutual information server to infer coevolution

MISTIC (mutual information server to infer coevolution) is a web server for graphical representation of the information contained within a MSA (multiple sequence alignment) and a complete analysis tool for Mutual Information networks in protein families. The server outputs a graphical visualization of several information-related quantities using a circos representation. This provides an integra...

متن کامل

Inferring protein-DNA dependencies using motif alignments and mutual information

MOTIVATION Mutual information can be used to explore covarying positions in biological sequences. In the past, it has been successfully used to infer RNA secondary structure conformations from multiple sequence alignments. In this study, we show that the same principles allow the discovery of transcription factor amino acids that are coevolving with nucleotides in their DNA-binding targets. R...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2008